Mining statistically significant substrings using the chi-square statistic

نویسندگان
چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mining Statistically Significant Substrings using the Chi-Square Statistic

The problem of identification of statistically significant patterns in a sequence of data has been applied to many domains such as intrusion detection systems, financial models, web-click records, automated monitoring systems, computational biology, cryptology, and text analysis. An observed pattern of events is deemed to be statistically significant if it is unlikely to have occurred due to ra...

متن کامل

Mining Statistically Significant Patterns using the Chi-Square Statistic

Statistical significance is used to ascertain whether the outcome of a given experiment can be ascribed to some extraneous factors or is solely due to chance. An observed pattern of events is deemed to be statistically significant if it is unlikely to have occurred due to randomness or chance alone. In the thesis, we study the problem of identifying the statistically relevant patterns in string...

متن کامل

Mining Statistically Significant Substrings Based on the Chi-Square Measure

Given the vast reservoirs of data stored worldwide, efficient mining of data from a large information store has emerged as a great challenge. Many databases like that of intrusion detection systems, web-click records, player statistics, texts, proteins etc., store strings or sequences. Searching for an unusual pattern within such long strings of data has emerged as a requirement for diverse app...

متن کامل

Your Chi-Square Test is Statistically Significant: Now What?

Applied researchers have employed chi-square tests for more than one hundred years. This paper addresses the question of how one should follow a statistically significant chi-square test result in order to determine the source of that result. Four approaches were evaluated: calculating residuals, comparing cells, ransacking, and partitioning. Data from two recent journal articles were used to i...

متن کامل

Most Significant Substring Mining Based on Chi-square Measure

Given the vast reservoirs of sequence data stored worldwide, efficient mining of string databases such as intrusion detection systems, player statistics, texts, proteins, etc. has emerged as a great challenge. Searching for an unusual pattern within long strings of data has emerged as a requirement for diverse applications. Given a string, the problem then is to identify the substrings that dif...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the VLDB Endowment

سال: 2012

ISSN: 2150-8097

DOI: 10.14778/2336664.2336677